Off-Policy Shaping Ensembles in Reinforcement Learning

نویسندگان

  • Anna Harutyunyan
  • Tim Brys
  • Peter Vrancx
  • Ann Nowé
چکیده

Recent advances of gradient temporal-difference methods allow to learn off-policy multiple value functions in parallel without sacrificing convergence guarantees or computational efficiency. This opens up new possibilities for sound ensemble techniques in reinforcement learning. In this work we propose learning an ensemble of policies related through potential-based shaping rewards. The ensemble induces a combination policy by using a voting mechanism on its components. Learning happens in real time, and we empirically show the combination policy to outperform the individual policies of the ensemble.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Off-Policy Reward Shaping with Ensembles

Potential-based reward shaping (PBRS) is an effective and popular technique to speed up reinforcement learning by leveraging domain knowledge. While PBRS is proven to always preserve optimal policies, its effect on learning speed is determined by the quality of its potential function, which, in turn, depends on both the underlying heuristic and the scale. Knowing which heuristic will prove effe...

متن کامل

Masters Thesis: Shaping Methods to Accelerate Reinforcement Learning:

Reinforcement learning (RL) is an attractive solution for deriving an optimal control policy by on-line exploration of the control task. In reinforcement learning there is no need to specify how the task is to be achieved. In fact, RL is a way of programming the agents by specifying a reward function. At every time step, the controller (agent) receives the process (environment) state, takes an ...

متن کامل

Ensemble Usage for More Reliable Policy Identification in Reinforcement Learning

Reinforcement learning (RL) methods employing powerful function approximators like neural networks have become an interesting approach for optimal control. Since they learn a policy from observations, they are also applicable when no analytical description of the system is available. Although impressive results have been reported, their handling in practice is still hard, as they can fail at re...

متن کامل

Policy Transfer using Reward Shaping

Transfer learning has proven to be a wildly successful approach for speeding up reinforcement learning. Techniques often use low-level information obtained in the source task to achieve successful transfer in the target task. Yet, a most general transfer approach can only assume access to the output of the learning algorithm in the source task, i.e. the learned policy, enabling transfer irrespe...

متن کامل

On-Policy vs. Off-Policy Updates for Deep Reinforcement Learning

Temporal-difference-based deep-reinforcement learning methods have typically been driven by off-policy, bootstrap Q-Learning updates. In this paper, we investigate the effects of using on-policy, Monte Carlo updates. Our empirical results show that for the DDPG algorithm in a continuous action space, mixing on-policy and off-policy update targets exhibits superior performance and stability comp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014